Subjectivity Lexicon for Czech: Implementation and Improvements

نویسندگان

  • Katerina Veselovská
  • Jan Hajic
  • Jana Sindlerová
چکیده

The aim of this paper is to introduce the Czech subjectivity lexicon, a new lexical resource for sentiment analysis in Czech. We describe particular stages of the manual refinement of the lexicon and demonstrate its use in the state-of-the art polarity classifiers, namely the Maximum Entropy classifier. We test the success rate of the system enriched with the dictionary on different data sets, compare the results and suggest some further improvements of the lexicon-based classification system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Czech Subjectivity Lexicon: A Lexical Resource for Czech Polarity Classification

This paper introduces Czech subjectivity lexicon – the new lexical resource for sentiment analysis in Czech. The lexicon is a dictionary of 4947 evaluative items annotated with part of speech and tagged with positive or negative polarity. We describe the method for building the basic vocabulary and the criteria for its manual refinement. Also, we suggest possible enrichment of the fundamental l...

متن کامل

Tracing Sentiments: Syntactic and Semantic Features in a Subjectivity Lexicon

In this paper we present a syntactic and semantic analysis of verbal entries in the Czech subjectivity lexicon Czech SubLex 1.0 concerning their semantic and valency properties with respect to the roots and degree of subjectivity and evaluativeness. We demonstrate that evaluative verbs share certain abstract syntactic patterns with valency positions encoding the position of the source and targe...

متن کامل

Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs

Though much research has been conducted on Subjectivity and Sentiment Analysis (SSA) during the last decade, little work has focused on Arabic. In this work, we focus on SSA for both Modern Standard Arabic (MSA) news articles and dialectal Arabic microblogs from Twitter. We showcase some of the challenges associated with SSA on microblogs. We adopted a random graph walk approach to extend the A...

متن کامل

Improvements to Korektor: A Case Study with Native and Non-Native Czech

We present recent developments of Korektor, a statistical spell checking system. In addition to lexicon, Korektor uses language models to find real-word errors, detectable only in context. The models and error probabilities, learned from error corpora, are also used to suggest the most likely corrections. Korektor was originally trained on a small error corpus and used language models extracted...

متن کامل

Why Words Alone Are Not Enough: Error Analysis of Lexicon-based Polarity Classifier for Czech

Lexicon-based classifier is in the long term one of the main and most effective methods of polarity classification used in sentiment analysis, i.e. computational study of opinions, sentiments and emotions expressed in text (see Liu, 2010). Although it achieves relatively good results also for Czech, the classifier still shows some error rate. This paper provides a detailed analysis of such erro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JLCL

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2014